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Abstract 

The Vapnik-Chervonenkis dimension of a set K in is the max- 
imal dimension of the coordinate cube of a given size, which can be 
found in coordinate projections of K. We show that the VC dimen- 
sion of a convex body governs its entropy. This has a number of 
consequences, including the optimal Elton's theorem and a uniform 
central limit theorem in the real valued case. 

I Introduction 

Let Xi, . . . ,Xn be vectors in the unit ball of a Banach space, and assume 
that E|| J27=i^i^i\\ — some number 6 > 0, where ei, . . . ,£„ denote 

independent Bernoulli random variables (taking values 1 and —1 with prob- 
ability 1/2). In 1983, J. Elton proved an important result that there 
exists a subset a of {1, ... ,n} of size proportional to n such that the set 
of vectors {xi)i^a is well equivalent to the ii unit-vector basis. Specifically, 
there exist numbers s,t > 0, depending only on 6, such that \a\ > sn and 

II Siecr ciiXiW > t Yliiea kj| for rsal numbers (a-j). This result was extended 



to the complex case by A. Pajor ||Pa |. 
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Several steps have been made towards finding asymptotically the largest 
possible s and t in Elton's Theorem ( [pa|] , |T[). Trivial upper bounds are 



that s < 6"^, which follows from the example of identical vectors, and t < S 
as demonstrated by shrinking the usual ii unit-vector basis. One of the aims 
of this paper is to prove Elton's Theorem with s > cS^ and t > c6, where 
c > is an absolute constant. Furthermore, we show that s and t satisfy 
^/st \og^'^ {2 /t) > c5, which, as an easy example shows, is optimal for all 6 up 
to a logarithmic factor. This improves the result of M. Talagrand from 

This theorem follows from new entropy estimates of a convex body K C 
[— 1, 1]" = B^. We show that the entropy of K is controlled by its Vapnik- 
Chervonenkis dimension. This parameter, denoted by VC{K,t), is defined 
for every < t < 1 as the maximal size of a subset cr of {1, . . . ,n}, such 
that the coordinate projection of K onto Mf^ contains a coordinate cube of 
the form x + [0,^]°". This notion carries over to convexity the "classical" 
concept of the VC dimension, denoted by VC{A), and defined for subsets A 
of the discrete cube {0, 1}" as the maximal size of the subset a of {1, . . . , n} 
such that PaA = {0,1}°", where is the coordinate projection onto the 
coordinates in a (see ||LT]| §14.3). 

Consider the unit ball of £p, 1 < p < oo, and let us look at the covering 
numbers N(K,n^^^Bp,t), which are the minimal number of translates of 
tn ^l-PBl in M" needed to cover K. A volumetric bound on the entropy (which 
is the logarithm of the covering numbers) shows that 



logAr(i^,n^/PS",t) < log(5/t) ■ 



n. 



One question is whether it is possible to replace the dimension n on the 
right-hand side of this estimate by the VC dimension VC(i^, ct), which is 
generally smaller? This is perfectly true for the Boolean cube: the known 
theorem of R. Dudley that lead to a characterization of the uniform central 
limit property in the Boolean case states that if A C {0, 1}" then 

logAr(/l,n^/2^2">^) < Clog(2/t) ■ VC(A). 

This estimate follows by a random choice of coordinates and an application 
of the Sauer-Shelah Lemma (see ||L'1]| Theorem 14.12). The same problem 



for convex bodies is considerably more difficult, as to bound VC(A', t) one 
needs to find a cube in PaK with well separated faces, not merely disjoint. 
We prove the following theorem. 
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Theorem 1.1 There are absolute constants C, c > such that for every 
convex body K C B^, every 1 < p < oo and any < t < 1, 

log N{K, n^/PB'^, t) < Cp^ log2(2/t) • VC{K, at). (1) 

Moreover, 

logN{K, 5^, t) < CM^ \oi{2/t) ■ VC{K, at), (2) 



provided that either the right or the left hand side of (0) is larger than t 



M 



n. 



Let us comment on estimate (S), which improves the main lemma of 



ABCH|| . This bound can not hold in general if the coefficient in front of the 
VC dimension depends only on t and not on n, since for K = B^ we have 
VC{K, t) = 2/t and log A^(is:, B^,t) > \ogn. Next, (|) is best complemented 
by the easy lower bound 

logiV(ir,S^,t)>VC(if,ct), 

for some absolute constant c > 0, which follows from the definition of the 
VC dimension and by a comparison of volumes. These two bounds show 
that the || ■ ||oo-entropy of K is governed by the VC dimension of K, up to a 
logarithmic factor in t. 

The relation to the Elton-Pajor Theorem is the following. If i^' is a 
symmetric convex body, then VC(i^', t) is the maximal cardinality of a subset 
cr of {1, . . . , n} such that || J2iea (^i^i\\K° > (V^) J2iea 1^*1 ^^^^ numbers 

(aj), where Cj are the canonical unit vectors in R" and K° is the polar of K. 
Note that if (gi) are independent standard gaussian random variables then 
-^11 Yl'i=i^i^i\\ — '^^W J2^=i9i^i\\ every norm ( |[LT|| §4.5). Therefore, our 
problem reduces to finding a bound on 

n 



E = E\\}^gie. 



=1 



in terms of the VC-dimension of K. The latter is relatively easy once we 
know (|l|). Indeed, replacing the entropy by the VC dimension in Dudley's 
entropy inequality it follows that there are absolute constants C and c such 
that 



POO /•! 

E<C ^/\ogN{K^B^ dt < \ y^YC{K,ct) log(2/t) dt. 

(3) 
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This inequality improves of the main theorem of M. Talagrand in [[T|. El- 
ton's Theorem with optimal asymptotics follows from @ by comparing the 
integrand to an appropriately chosen integrable function. 

We present a few other applications to convexity. Inequality can be 
applied, as in |^], to compare two geometric properties of a Banach space 
called type and infratype. Recall that a Banach space X is of gaussian type 
p if there exists some M > such that for all n and all sequences of vectors 

{Xi)i<.ni 

e|| <M(5^||a;,r) \ (4) 

i=l j=l 

The best possible constant M in this inequality is denoted by Tp(X). Next, 
X has infratype p if there exists some M > such that for all n and all 
sequences of vectors (xj)j<„, we have 



mm 



i=\ i=\ 

The best possible constant M in this inequality is denoted by lp{X\ 

M. Talagrand proved in that if 1 < p < 2 then Tp(X) < C{v)I^{Xf, 
where C(p) is a constant which depends only on p. It is not known whether 
the square can be removed. Moreover, the situation for p = 2 is unknown 
in general, but can be used to show that there is an absolute constant C 
such that for any n dimensional Banach space X, 



T2(X) < h{X) -CXoi [j^^ < HX) - Clog 



n. 



Finally, we present an application of Theorem |1.1| to empirical processes. 
We use a version of to bound the entropy of an arbitrary subset of 
using a scale-sensitive version of the "classical" VC dimension, known as 
the fat-shattering dimension. In particular we show that if F is a class 
of uniformly bounded functions, which has a relatively small fat-shattering 
dimension, then it satisfies the uniform central limit theorem for any proba- 
bility measure. This extends Dudley's characterization for VC classes to the 
real-valued case. 

The paper is organized as follows. In Section ^ we prove the bound 
for the i?p-entropy in abstract finite product spaces, and then derive (|lD 
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by approximation. Actually, the convexity of K plays a very little role in 
these results, and similar entropy bounds hold for arbitrary susets of B^. 
In Section ^ we prove (0) for the 5^-entropy by reducing it to ([1|) through 
an independent lemma that compares the i?p-entropy to the 5^-entropy. In 
Section ^ we apply (|lD to convex bodies. In particular, we deduce Elton's 
Theorem and the infratype results. Finally, in Section |^ we apply (|l|) to 
empirical processes. 

Throughout this article, positive absolute constants are denoted by C and 
c. Their values may change from line to line, or even within the same line. 

ACKNOWLEDGEMENTS: The second author is thankful to Mark Rudel- 
son who contributed a lot of effort and enthusiasm to the paper. Warmest 
thanks are to Nicole Tomczak-Jaegermann for her constant support. The 
second author also acknozledges a support from the Pacific Institute of Math- 
ematical Sciences, and thanks the Department of Mathematicql Sciences of 
the University of Alberta for hospitality. 

2 B^-entropy in abstract product spaces 

We will introduce and work with the notion of the VC dimension in an 
abstract setting that encompasses both classes considered in the introduction, 
the subsets of the discrete cube {0, 1}" and the class of convex bodies in M". 

We call a map d : T x T a quasi-metric if d is symmetric and 

reflexive (that is, Vx, y, d{x,y) = d{y,x) and d{x,x) = 0). We say that 
points X and y inT are separated if d{x, y) > 0. Thus, d does not necessarily 
separate points or satisfy the triangle inequality. 

Definition 2.1 Let {T,d) be a quasi-metric space and let n be a positive 
integer. For a set A G T"' and t > 0, the VC-dimension VC{A,t) is the 
maximal cardinality of a subset a C {1, . . . ,n} such that the inclusion 

PaA^l[{a„b,} (6) 

holds for some points ai, bi E T , i E a with d{ai, bi) > 5. If no such a exists, 
we set VC(y4,t) = 0. When there is a need to specify the underlying metric, 
we denote the VC dimension by YCd{A,t). 
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Since VC(A, t) is decreasing in t and is bounded by n, which is the "usual" 
dimension of the product space, the hmit 



VC{A) := hm VC{A,t) 
t — >o+ 

always exists. Equivalently, VC{A) is the maximal cardinality of a subset 
0" C {1, . . . ,n} such that (|) holds for some pairs (a,, 6j) of separated points 
in T. 

This definition is an extension of the "classical" VC dimension for subsets 
of the discrete cube {0, 1}", where we think of {0, 1} as a metric space with 
the — 1 metric. Clearly, for any set A C {0, 1}" the quantity VC(A, t) does 
not depend on < t < 1, and hence 

VC{A) =max||a| : a C {1, . . . , n}, P^A = {0,1}'^}, 

which is precisely the "classical" definition of the VC dimension. 

The other example discussed in the introduction was the VC dimension 
of convex bodies. Here T = M or, more frequently, T = [—1, 1], both with 
respect to the usual metric. If if C T" is a convex body, then VC(-ft', t) is 
the maximal cardinality of a subset cr C {1, . . . , n} for which the inclusion 

P^K^x + {t/2)B^ 

holds for some vector x G M'^ (which automatically lies in P^K). It is easy 
to see that if K is symmetric, we can set a; = 0. Also note that for every 
convex body VC{K) = n. 

The main results of this article rely on (and are easily reduced to) a 
discrete problem: to estimate the VC-dimension of a set in a product space 
T", where (T, d) is a finite quasi-metric space. T" is usually endowed with 
the normalized Hamming quasi-metric dn{x,y) = Yl^=id{x{i),y{i)) for 
x,y e T". 

In the main result of this section we bound the entropy of a set A C 
with respect to dn in terms of VCi^A). 

Theorem 2.2 Let {T,d) be a finite quasi-metric space with diam(T) < 1, 
and set n to be a positive integer. Then, for every set A G T"^ and every 
<e <1, 

\ogNiA,dn,e) < C\og\\T\/6)-VCiA), 
where C is an absolute constant. 
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Before presenting the proof, let us make two standard observations. We 
say that points x,y E T"' are separated on the coordinate io if x{io) and y{io) 
are separated. Points x and y are called e-separated if dn{x, y) > e. 

Clearly, if A' is a maximal e-separated subset of A then \ A'\ > N{A, dn, £)■ 
Moreover, the definition of dn and the fact that diam(T) < 1 imply that every 
two distinct points in A' are separated on at least en coordinates. This shows 
that Theorem p.2| may be reduced to the following statement. 



Theorem 2.3 Let {T,d) be a quasi-metric space for which diam(T) < 1. 
Let < e < 1 and consider a set A G such that every two distinct points 
in A are separated on at least en coordinates. Then 

\og\A\<C\og\\T\/e)-VCiA). (7) 



The first step in the proof of Theorem |2.^ is a probabilistic extraction 
principle, which allows one to reduce the number of coordinates without 
changing the separation assumption by much. Its proof is based on a simple 
discrepancy bound for a set system. 

Lemma 2.4 There exists an absolute constant c > for which the following 
holds. Let e > and assume that S is a system of subsets of {!,... ,n} 
which satisfies that each S E S contains at least en elements. Let k < n be 
an integer such that log \S\ < cek. Then there exists a subset / C {1, . . . , n} 
of cardinality \I\ = k, such that 

\lr\S\>ek/4 for alls eS. 

Proof. If \S\ = 1 the lemma is trivially true, hence we may assume that 
|iS| > 2. Let < 5 < 1/2 and set ^i, . . . , 5„ to be {0, l}-valued independent 
random variables with E5j = 6 for all i. By the classical bounds on the tails 



of the binomial law (see [0, or ||LT|| 6.3 for more general inequalities), there 



is an absolute constant Cq > for which 



P 



n , 

- '^)| > 2'^''} - 2exp(-CoH- (8) 



Let 6 = k/2n and consider the random set I = {i : 6i = 1}. For any set 
-B C {1, . . . , n}, |/ n -B| = J2ieB ^i- Then (H) implies that 

lr\B\> 6\B\/2} > 1 -2exp(-co5|5|). 
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Since for every S" G iS, jS"! > en, then 

> 1 — 2exn('- ^ 



P{|/n5| >ek/A} > 1 -2exp{--coek). 



Therefore, 

pjvS eS, |/ n ^1 > ^ek^ > 1 - 2|5| exp{~coek). 

By the assumption on k, this quantity is larger than 1/2 (with an appropri- 
ately chosen absolute constant c). Moreover, by a similar argument, |/| < k 
with probability larger than 1/2. This proves the existence of a set / satis- 
fying the assumptions of the lemma. ■ 

Proof of Theorem |2.3|. We may assume that \T\ > 2, e < 1/2, n > 2 and 



max(4, exp(4c)) < |y4| < |T|", where < c < 1 is the constant in Lemma |2^. 
The first step in the proof is to use previous lemma, which enables one to 
make the additional assumption that log|A| > cen/4. Indeed, assume that 
the converse inequality holds, and for every pair of distinct points x,y & A, 
let S{x,y) C {1,... ,n} be the set of coordinates on which x and y are 
separated. Put S to be the collection of the sets S{x, y) and let k be the 
minimal positive integer for which log|iS| < cek. Since 1^41 < \S\ < |v4p, 
then ^ 

ce{k — 1) < log \S\ < 2 log \ A\ < -cen, 



which implies that 1 < k < n. Thus, by Lemma 2]4| there is a set / C 
{1, . . . ,n}, |/| = k, with the property that every pair of distinct points 
x,y E A is separated on at least £|/|/4 coordinates in /. Also, since 4c < 
log|y4| < log|5| < cek, then e\I\/4: > 1 and thus \PiA\ = \A\. Clearly, to 
prove the assertion of the theorem for the set A C T", it is sufficient to prove 
it for the set PjA C (with \I\ instead of n), whose cardinality already 
satisfies log|P/A| = log |A| > ce{k — l)/2 > ce\I\/4. Therefore, we can 
assume that \A\ = exp(an) with a > ce for some absolute constant c. 

The next step in the proof is a counting argument, which is based on the 
proof of Lemma 3.3 in | ABCH| | (see also 



A set is called a cube if it is of the form D^r = Yli^^icii, bi}, where a is a 
subset of {1, . . . , n} and a,, bi G T. We will be interested only in large cubes, 
which are the cubes in which a, and bi are separated for alH G a. Given a 
set B C T^, we say that a cube D^^ embeds into B if D„ C PaB. Note that 
if a large cube with |cr| > v embeds into B then VC(-B) > v. 
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For all m > 2, n > 1 and < s < 1/2, let ti;{m,n) denote the maximal 
number t such that for every set 5 C T", \B\ = m, which satisfies the 
separation condition we imposed (that is, every distinct points x,y & B are 
separated on at least en coordinates), there exist t large cubes that embed 
into B. If no such B exists, we set t^{rn,n) to be infinite. 

The number of possible large cubes D^j for \a\ < f is smaller than 
Sfc=i (fc) l-^l^'^' ^^'^ every a of cardinality k there are less than iTp'^ possi- 
bilities to choose D„. Therefore, if t£(|y4|,n) > Yllt=i (fc)I^P^; there exists a 
large cube for some > f that embeds into A, implying that VC(A) > v. 
Thus, to prove the theorem, it suffices to estimate t£{m,n) from below. To 
that end, we will show that for every n > 2, m > 1 and < e < 1/2, 

te{2m-\T\'^/e,n)>2te{2m,n-l). (9) 

Indeed, fix any set i? C T" of cardinality \B\ = 2m- \T\'^/e, which satisfies the 
separation condition above. If no such B exists then t^{2m ■ |Tp/e, n) = oo, 
and (|^) holds trivially. Split B arbitrarily into m ■ \T\'^/e pairs, and denote 
the set of the pairs by V. For each pair {x, y) E V let I{x, y) C {I, . . . ,n} 
be the set of the coordinates on which x and y are separated, and note that 
by the separation condition, \I{x,y)\ > en. 

Let if) be the random coordinate, that is, a random variable uniformly 
distributed in {1, . . . ,n}. The expected number of the pairs {x,y) G V for 
which io € y) is 

E = POoG/(a;,y)}> |P|-£ = m|T|2. 

Hence, there is a coordinate iq on which at least m|Tp pairs (x, G V are 
separated. By the pigeonhole principle, there are at least m|T|V(lp) > 2m 
pairs (x, G V for which the (unordered) set {x(zo), l/(^o)} is the same. 

Let I = {1, . . . ,n} \ {io}- It follows that there are two subsets of B, 
denoted by Bi and B2, such that = I-B2I = 2m and 

B^ C {61} X T', B2 C {62} X 

for some separated points 61,62 G T. Clearly, the set Bi satisfies the sep- 
aration condition and so does i?2- It is also clear that if a large cube Dcr 
embeds into Bi, then it also embeds into B, and the same holds for B2. 
Moreover, if the same cube embeds into both Bi and B2, then the large 
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cube {bi, 62} X -Do- embeds into B (since {61, 62} x D„ C P{j„}uo-B). Therefore, 
te{\B\,n) > 2t^(|fii|,n - 1) > 2tei\Bi\,n - 1), establishing (|). 

Since t£{2,n) > 1, an induction argument yields that ti;{2{\T\'^ /eY ,n) > 
2^ for every r > 1. Thus, for every m > 4 

1 

te{m, n) > m2iog{|r|2/.) . 

(It is remarkable that the right hand side does not depend on n). Therefore, 
YC{A) > V provided that v satisfies 



To estimate f , one can bound the right-hand side of ( pUj ) using Stirling's 
approximation J21=i (T) — h'^i'^ ~ 7)^"''^]"") where 7 = v/n < 1/2. It 



follows that for v < n/2, ^21=1 (fc)I^P < (-^f"- Taking logarithms i 
([l0|), we seek integers v < n/2 satisfying that 



21og(|T|7£ 
This holds if 



an /\T\n 
> 2t>log 



an \ / , /4|T|log(|T|V£) 

V<-{ , /Slog' I / ^ 



\ogi\T\ye)J/ "V « 
proving our assertion since a > ce. ■ 

Corollary 2.5 Let n > 2 and p > 2 be integers, set < e < 1 and q > 0. 

Consider a set A <Z {I, . . . such that for every two distinct points x,y E 
A, \x{i) — y{i)\ > q for at least en coordinates i. Then 

\og\A\ < Clog^ip/e) ■YC{A,q). 

Proof. We can assume that q > 1. Define the following quasi- metric on 
r={l,...,p}: 

d(a» = |° 'f I" - "I < 
I 1 otherwise. 

Then N{A,dn,e) = \A\. By Theorem 

\og\A\<C\og\p/e)-VCd{A), 

which completes the proof by the definition of the metric d. ■ 
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Now we pass from the discrete setting to the "continuous" one - namely, 
we study subsets of 5^. Recall that the Minkowski sum of two convex bodies 
A, 5 C M" is defined as A + B = {a + b\ a E A, be B}. 

Corollary 2.6 For every A C -B^, < t < 1 and < e < 1, 

log N {A, ,/EB^,t) < Clog'(2/te) ■ VC(A + £5^,t/2). 

Proof. Clearly, we may assume that e < t/4. Put P = ^ and let 

T = {-2sp, -2e{p - 1), ... , -26, 0, 2^, . . . , 2e{p - 1), 2ep}. 

Since t — e > 3t/4, then by approximation one can find a subset Ai C T" for 
which AiC A + eB^ and N{Ai, y/^B^, t - e) > N{A, ^B^, t). Therefore, 
there exists a subset A2 C Ai of cardinality \A2\ > N{A, -JnB'2,t)^ which is 
^ \/n-separated with respect to the || ■ ||2-norm. Note that every two distinct 
points x,y E A2 satisfy that 

n 

^ \x{i) - y{i)\^ > (9tVl6)n > t'^n/2 

i=l 

and that \x{i) — y{i)\'^ < 4 for all i. Hence \x{i) — y{i)\ > t/2 on at least 
coordinates i. By Corollary ^]5| applied to A2, 

logical < Clog2(2/te) • VC(A2,t/2), 

and since A2 C Ai C A + eB^, our claim follows. ■ 

From this we derive the entropy estimate (|I]). 

Corollary 2.7 There exists an absolute constant C such that for any convex 
body K C -B^ and every < t < 1, 

log N{K,^Bl^,t) < Clog\2/t) ■VC{K,t/A). 

Proof. This estimate follows from Corollary |2.6| by selecting e = t/4 and 
recalling the fact that for every convex body C and every < 6 < a, 

YC{K + bB^, a) < YC{K, a-b). 

The latter inequality is a consequence of the definition of the VC-dimension 
and the observation that if < 6 < a are such that aB^ (Z K + bB^, then 
(a - b)B^ CK. m 

Note that Corollary ^]6] and Corollary ^]7| can be extended to the case 
where the covering numbers are computed with respect to n^^^Bp for 1 < 
p < 00, thus establishing the complete claim in (|l|). 
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3 5^-entropy 

In this section we prove estimate (0), which improves the main combinatorial 
result in [|ABCH|| . Our result can be equivalently stated as follows. 



Theorem 3.1 Let K C be a convex body, set t > and put v = 
VC{K,t/8). Then, 

hgN{K,Bl,t) < Cv ■ \og'{n/tv), (11) 



where C is an absolute constant. 



This estimate should be compared with the Sauer-Shelah lemma for sub- 
sets of the Boolean cube {0, 1}*^. It says that if A C {0, 1}*^ then for 
V = VC{K) we have \A\ < (g) + (i) + . . . + ("), so that 

log |y4| < 2v ■ \og{n/v) 

(and note that, of course, \A\ = N{K,B^,t) for all < t < 1/2). 

We reduce the proof of ( ^.1[ ) to an application of the i?p-entropy estimate 
([l|). As a start, note that for p = \ogn, B^^ C n^/^B'^^ C ei?^. Therefore, an 
application of (|ID for this value of p yields 

\ogN{K,B^^,t) <Cv\og\n/t), 

which is slightly worse than (11). 

To deduce (|ll]) we need a result that compares the 5^-entropy to the 
-Bp-entropy, and which may be useful in other applications as well. 

Lemma 3.2 There is an absolute constant c > such that the following 
holds. Let A be a subset of B^ such that every two distinct points x,y & A 
satisfy \\x — y\oQ > Then, for every integer 1 < k < n/2, there exists a 
subset A' G A of cardinality 

with the property that every two distinct points in A' satisfy that \x{i)—y{i)\ > 
t/2 for at least k coordinates i. 
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Proof. We can assume that < t < 1/8. Set s = t/2. The separation 
assumption imply that N{A, B^, s) > \A\. Denote by the set of all 
points X in M" for which \x{i)\ > 1 on at most k coordinates i. One can 
see that N{A, Dk, s) = N{A, sDk, 1) = N{A, sDk n 35^, 1). Then, by the 
submultiplicative property of the covering numbers, 

iV(A,5^,s) < iV(A,sD,n3i?^,i)-iV(sD,n35^,i?^,s) 

< iV(A,sDfe,l)-iV(sDfcn3i?^,i?^,s). (12) 
To bound the second term, write D}. as 



\a\=k 

where the union is taken with respect to all subsets a C {1, . . . ,"«.}, and the 
sum in the right-hand side is the Minkowski sum. Thus, 



sD, n 35^ = u (35^ + sr 



\a\=k 

Denote by N'{A, B, t) the number of translates of tB by vectors in A needed 
to cover A. Therefore, 

iV(sAn3i?^,i?^,s) < 5^iV(3i?^ + (-s,.r,i?^,.) 

\a\=k 

< 5^iV'(35-,5^,.). 

\a\=k 



n 

CXD 



The latter inequality holds because any cover of 35^ by translates of sB\ 
automatically covers 35^ + {—s^sY". Hence, for some absolute constant C, 

< (:)(c/.)' 

by a comparison of the volumes, and by ([T2|) we obtain 

iV(A,D,,.)>Q \csfN{A,B-^,s)>(^^ \ctnAl 
from which the statement of the lemma follows by the definition of D^. ■ 
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Now we can compare the -B^-entropy of K to the entropy of K. 

Corollary 3.3 Let A C 5^ be a set, and set < t < 1 and < e < t/8. 
Then 

iV(AS^,t)< (j) N{A,nBle), 



where C is an absolute constant. 

Proof. Note that the set A' in the conclusion of Lemma is such that 
every two distinct points x,y E A' satisfy ||x — y\\i > {t/2)k. Thus A' is 
(t/2)A;-separated in the || ■ ||i-norm, implying that \A'\ < N{A,B^,{t/A)k). 
By Lemma p.2|, 



J {C/tfNiA, B-, {t/A)k) < (— ) iV(A, ^) • 



The conclusion follows by choosing k which satisfies = ■ 

Proof of Theorem |3.1| . Fix < t < 1, and let a be defined by 
\ogN{K, B'^^t) = exp{an). Hence, there exists a set A C -ft" of cardinal- 
ity \A\ = exp(an), where every two distinct points x,y E A satisfy that 
ll^; — y||oo > t. Applying Lemma p.2| we obtain a subset A' G A G K oi 
cardinality 



such that for every two distinct points in A', \x{i) — y{i) \ > t/2 on at least k 
coordinates i. Selecting k = , . we see that lA'l > e"""/^. 

log(2/ta) I I — 

The proof is completed by discretizing A' and applying Corollary p.5| with 
p = 4/t and e = k/n in the same manner as we did in the previous section. 
Therefore 

An 

an/2 = log|A'| < Clog2(— )-VC(A' + (t/4)i?^,t/2) 
< Clog\l/ta)-VC{K,t/A), 

and thus 

an < c\og'^{n/tv) ■ v, 
as claimed. ■ 
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4 Applications to convex bodies 



We start by presenting an improvement of the main result of M. Talagrand 
from Q. 

Theorem 4.1 There are absolute constants C,c > such that for every 
convex body K C 5^ 

E < f y/VC{K,ct)log{2/t)dt, 

JcE/n 

where E = E|| Y17=i 9i^i\\K° > (^''^d (ej)"^| is the canonical vector basis in M". 

For the proof, we need a few standard definitions and facts from the local 
theory of Banach spaces, which may be found in ||MS| . 



Given an integer n, let S"""^ be the unit Euclidean sphere with the nor- 
malized Lebesgue measure (T„, and for every measurable set A C M" denote 
by voM its Lebesgue measure in R"-. For a convex body K in M", put 
Mk = jgn-i \\A\k dan{x) and let denote M^o, where K° is the polar 
of K. Recall that for any two convex bodies K and L, M^_^_^ < + Ml. 

Urysohn's inequality states that ( < M^r. 

Next, put i{K) = E\\ J2t -_i gi^i\\Ki where are independent standard 

gaussian random variables and (cj)"^]^ is the canonical basis of M". It is well 
known that ^{K) = Cny/uMx, where < 1 and c„ ^ 1 as n — oo. Recall 
that by Dudley's inequality (see |Fi|) there is an absolute constant Cq such 
that for every convex body K, 



POO 

i{K°) <Ci I ^\ogN{K, B-^,e) de. 
Jo 



It is possible to slightly improve Dudley's inequality using an additional 
volumetric argument. This observation is due to A. Pajor. 

Lemma 4.2 There exist absolute constants C and c such that for a convex 
body K m W 

poo 

i{K°) <C ^J\ogN{K, B^,e) de. 

JcMt 
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Proof. By Dudley's inequality, i{K°) < Ci J^^ ^log N{K, B^,e) de. Hence, 
it suffices to show that there is some absolute constant c for which 

Ci ^/\og N{K,B^,e) de < -£{K°). (13) 

To that end, note that for every 5 > 0, 

/ 9 ]\/f * \ n 

N{K,Ble)<(l + ^) . (14) 
Indeed, by a standard volumetric argument and Urysohn's inequality, 
(N(K R" r\\y^ < ^ ( "^o^jK + eB^) \Vn 1 



e \ vol(5^) 7 - e 
<-(M^ + M;^„) = iM^ + l. 

6 S 



Thus, by (^^, the integral on the left-hand side of (|T^) is bounded by 



/ log^/'(l + -M^)rf£, 



which, after a change of variables, is majorized by 

r.c/2 



2Cin^'^Ml [ log^/'(l + 1/t) dt < Cin^/HrK{c/2f'^ < -(.{K°) 
Jo 2 

for an appropriate choice of c. 



Proof of Theorem |4.1J . By Lemma |4.2| , there exist absolute constants C 
and c such that 



00 



E = i{K°) <C ^\ogN{K, B^,t) dt. 

Since K C \/nB2, the integrand vanishes for all t > ^Jn. Therefore, using 
Corollary 

E<C y/log N{K, B^,t) dt = CV^ I JlogN{K,ny^B^,t) dt 

JcE/^ JcE/n 

< Cv^ [ VVC{K, ct) log(2/t) dt, 

JcE/n 

as claimed. ■ 
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The main corollary we derive from Theorem O is Elton's Theorem with 
the optimal dependence on 6. 



n 

" > 5n. 



Theorem 4.3 There is an absolute constant c for which the following holds. 
Let xi, . . . ,Xn be vectors in the unit ball of a Banach space. Assume that for 
some 6 > 

ie|| y^jiXj 

i=l 

Then there exist two numbers, < s < 1 and c6 < t < 1, which satisfy that 
y/st \og^-^ {2 /t) > 6, and a subset a C {1, . . . ,n} of cardinality \a\ > sn, such 
that 

l^^ajXj >t^^|ai| for all scalars (ai). (15) 
In particular, we always have s > cS^ and t > cS. 

Proof of Theorem |4.3| . By a perturbation argument, we may assume that 
the vectors ( linearly independent. Hence, using an appropriate 

linear transformation we can assume that X = (M", || ■ ||) and that (xj)i<„ 
are the unit coordinate vectors (ej)j<„ in M". Let K = {Bx)° and note that 
since ||ei||x < 1 then B"^ C K°. Therefore, K C B^ C y/nB^. 

Let = E|| X]r=i 9i^i\\x- Since K C B^, then by Theorem ^?1| there are 
absolute constants Cq and Cq such that 

Sn<E< CoVn [ VVC(ir, t) log(2/t) dt. 

J cnS 



Consider the function 



h{t) 



t\oi-\2/t) 

where the absolute constant c > is chosen so that h{t) dt = 1. It follows 
that there exits some CqS <t<l such that 



^/yC{K, cot)/n ■ log(2/t) > 5h{t). 

Hence 



c5^ 

tHog'-\2/ty 
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Therefore, letting s = yC{K,CQt)/n we see that the announced relation 
between s and t holds, and that there exists a subset a C {1, . . . , n} of 
cardinality \a\ > sn such that {cot/2)B^ C PaK. Dualizing, we have 
{cot/2){K° n ]R°") C Bi, which completes the proof of the main part of the 
theorem. 

The "In particular" part follows trivially. ■ 

Remarks. Firstly, as the proof shows, the exponent 2.5 can be reduced 
to any number larger than 2. Secondly, the relation between s and t in 
Theorem is optimal up to a logarithmic factor for all < 5 < 1. This 
is seen from by the following example, shown to us by Mark Rudelson. For 
< (5 < l/\/n, the constant vectors Xi = b^Jn ■ ei in X = R show that st^ 
in Theorem [4.3| can not exceed . For 1/ \/n < 5 < 1, we consider the body 
D = coYw{B1 U ^5^) and let X = (M", || • \\d) and = e^, z = 1, . . . , n. 



Clearly, E|| ^giXi\\x > 1E|| ^^ieiWo = ^n. Let < s,t < 1 be so that (|15D 
holds for some subset cr c {1, • . . ,n} of cardinality \a\ > sn. This means 
that > for all x G M.^. Dualizing, we have ^;^||x||2 < t||a;||/5o < 

||a;||oo for all x e M°". Testing this inequality for x = J2iea^iy obtain 
T^\Ao] < 1. This means that st'^ < 5"^. 



The next application of Theorem 4.1 is an improvement of a result of 



M. Talagrand [[IJ which compares the average over the ± signs to the mini- 
mum over the ± signs of || XlILi ^ta;^!!. 

Corollary 4.4 Let xi, . . . ,Xn he vectors in the unit ball of a Banach space, 
and let M > 0. Fix a number < A < log^^(ri/M^) and assume that 



mm 



■ II '^^Vi^i < M|cr|"'^/^ for alia with \a\ < 



Xn. 



Ida 



Then 



^\^giXi <CM{n/Xy/\ 



i=l 



for some absolute constant C . 

Proof. As we did before, we can assume that our Banach space is X = 
(M", ||- II), that ( the unit coordinate vectors in M", and set K = Bx* 
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The hypothesis of the lemma imphes that VC(ii', Mv ^/^) < v iiO < v < An, 
hence 

YC{K, t) < {M/tf for M{\n)-^'^ <t<l. (16) 

Let E = E|| X]r=i Theorem ^A\ , there are absolute constants C 

and c such that 

E < C^/^ [ ^/VC{K,ct) log(2/t) rft. 

If cE/n < M(Xn)~^^'^ , the corollary trivially follows. Otherwise, if the con- 
verse inequality holds, then by (p!6D, 



E < [ {M/t) log(2/t) dt < cy/nM ■ \og^ (n/cE), 

JcE/n 

and by the assumption on A, 

E < Cy/nM ■ \og\n/M^) < C^M ■ A"^/^ 

as claimed. 



Now we apply Corollary |4.4| to compare the type 2 constant T2{X) to the 
infratype 2 constant hi^) of a Banach space X. 

Let T2'^\x) and /2"'^(X) denote the best possible constants M in 
and d), respectively (with p = 2). So, Ti"^(X) and measure the 

type/infratype 2 computed on n vectors. Clearly, hiX) < T2{X) and 

)(x) < ri"^(x). 

Corollary 4.5 Let X be an n-dimensional Banach space. Then, for every 
number < A < log~'^(n/J2(X)^), 

T2(X) < CA-^/2-/f"^(X). 

In particular, we obtain 

T2(X) < /2(X) ■ Clog^ (77^) ^ ^2(X) ■ Clog^n. 
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Proof. By [ [l'J|| and [BKT| Theorem 3.1, the gaussian type 2 can be com- 
puted on n vectors of norm one. Precisely, this means that the constant 
T2(X) equals the smallest possible constant M' for which the inequality 



n 



m^QiXi <M'n 



1/2 



i=l 



holds for all vectors xi, . . . of norm one. Our assertion follows from 



5 The fat-shattering dimension and covering 

One of the important combinatorial parameters used to measure the "com- 
plexity" of a class of functions is the fat-shattering dimension, which is a 
scale-sensitive version of the Vapnik-Chervonenkis dimension. 

Definition 5.1 For every e > a set A = {xi, (Z Q is said to be 

e -shattered by F if there is some function 7 : A — M, such that for every 
I C {1, ...,n} there is some fiEF for which fi{xi) > ■j{xi) + e if i E I , and 
fi{xi) < l{xi) - eifi^L Let 



In cases where the domain is clear, we denote the fat-shattering dimension 
of F by fate(F). 

If F happens to be a class of Boolean functions, then by selecting 'j{xi) = 
1/2 we see that fat^(F,n) = VC(F) for every e < 1/2, where VC(F) is the 
classical Vapnik-Chervonenkis dimension. 

Note that the fat-shattering dimension may be controlled by the general- 
ized VC-dimension, in the following sense. Assume that F is a subset of the 
unit ball in Loo(f^), which is denoted by i?(Loo(f^)). Let s„ = {xi,...,Xn} 
be a subset of n and set F/sn = { (/(xi), |/ E F} C M". If 
VC(F/s„,t) = m, there is a subset a C {1, ...,n} of cardinality m such that 
PaF/sn D Y{i(za{aiM] whcre \hi - ai\ > t. By selecting 7(0;^) = {k + ai)/2 
it is clear that (xi)igo- is t/2-shattered by F, and thus 



Corollary |^ 



fate(F, Q) = sup\ \A\ AcQ, A is e-shattered by F 




VC{F/sn,t)<Mt/2{F,n). 
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The aim of this section is to bound the entropy of F with respect to 
empirical L2 norms. If s„ = {xi, ...,Xn} let /i^ be the empirical measure 
supported on s„, that is /i„ = J27=i ^a;*' where is the point evaluation 
functional on Xj. Empirical covering numbers play a central role in the the- 
ory of empirical processes. They can be used to characterize classes which 
satisfy the uniform law of large numbers (see [0 or ||VW|| for a detailed dis- 



cussion). It turns out that if F C B(^L^{Q)^ then F satisfies the uniform 
law of large numbers with respect to all probability measures if and only 
if sup^^ log N [F, L2{iXn) , £) = o{n) for every e > 0, where the supremum is 
taken with respect to all empirical measures supported on at most n elements 
of Q. In [ ABCH ] it was shown that F C B(^Loo{fl)) satisfies the uniform law 



of large numbers if and only if fate(F, i7) < 00 for every e > 0. 

Another important application of covering numbers estimates is the anal- 
ysis of the uniform central limit property. 

Definition 5.2 Let F C B(^Loo{^)) , set P to be a probability measure on 
Q and assume Gp to be a gaussian process indexed by F , which has mean 
and covariance 

EGpif)Gp{g) = I fgdP- I fdP J gdP 

A class F is called a universal Donsker class if for any probability measure 
P the law Gp is tight in ioo{F) and = n}^'^{Pn — P) G ioo{F) converges 
in law to Gp in l^[F). 

A property stronger than the universal Donsker property is called uniform 
Donsker. For such classes, z/^ converges to Gp uniformly in P in some sense. 
Instead of presenting the formal definition of the uniform Donsker property, 
we mention the following result of Gine and Zinn ||GZ|| , which characterizes 



such classes. Before presenting the result, we introduce the following nota- 
tion: for every probability measure P on fi, let pp{fig) = Ep{f — gY " 
{Ep{f - g)Y, and for every 5 > 0, set Fs = {f - g\f,g E F, pp{f,g) < 5}. 



Theorem 5.3 j\G/^ F is a uniform Donsker property if and only if the fol- 
lowing holds: for every probability measure P on Q, Gp has a version with 
bounded, pp-uniformly continuous sample paths, and for these versions, 

sup E sup \Gp{f )\ < 00, limsupE sup \Gp{h)\ = 0. 
p feF p h£Fs 
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It is possible to show that the uniform Donsker property is connected to 
estimates on covering numbers. 

Theorem 5.4 ^ Let F C. B{L^{n)). If 

I sup sup 1/ log A^(F, L2 (/in), de < 00, 

Jo n fj.n 

then F is a uniform Donsker class. 

Having this entropy condition in mind, it is natural to try to find covering 
numbers estimates which are "dimension free" , that is, do not depend on the 
size of the sample. In the Boolean case, such bounds where first obtained by 
Dudley (see [pi]] Theorem 14.13), and then improved by Haussler [fTa| , [VW| 
who showed that for any empirical measure //„ and any Boolean class F, 

N{F,L2{ii),e)<Cd{AeYe-^\ 

where C is an absolute constant and d = VC{F). In particular this shows 
that every VC class is a uniform Donsker class. 

Our goal is to obtain dimension-free estimates on the L2 covering numbers 
of subsets of B(^Loo{fi)) using their fat-shattering dimension, since in many 
cases it is easier to compute this parameter than to bound the covering 
numbers (see, e.g. [|AB|| ). 

Let F C B(^Loo{fl)) and fix a set Sn G fl. For every / G -F let //s„ = 
Er=i fi^i)(^i e F/^n- Clearly, ||/ - ghiit^n) = Wf/sn - g/snW^B^^ implying 
that for every t > 0, 

7V(F,L2(/in),t) =N{F/sn,V^Blt). (17) 
Finally, note that for any t > 0, 

VC(F/sn + ^i?^,^) <fat|(F/sn + ^i?^) <fat|(F/s„) <fat|(F). (18) 



Theorem 5.5 There is an absolute constant C such that for any class F C 
i?(Loo(f^)), any integer n, every empirical measure Hn and every t > 0, 

logiV(F,L2(/i„),t) < Cfati/8(F)log'| 
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Proof. Let s„ = {xi,...,x„} be the points on which yU„ is supported, and 
apply Corollary ^]6| for the set -F/s„. We obtain 



logiV(F/s„, V^Blt) < C\og\2/t) ■ YG{F/sn + ^5^,^/2). 

Then our claim follows from ( p!7D and (0). ■ 

Remark. It is possible to show that this bound is essentially tight. Indeed, 
fix a class F C -B(Loo(f^)) and put E{t) = sup„ sup^^ log A^(F, L2(/in), 
(that is, the supremum is taken with respect to all the empirical measures 
supported on a finite set). By Theorem gj, E{t) < Cfati{F,n)log^{2/t). 



On the other hand it was shown in [|Me|] that E(t) > cfati6t(-F, fi) for some 
absolute constant c. 

Comparing the result to Haussler's estimate, one can see that his bound 
is recovered up to one logarithmic factor in 1/t and the absolute constant. 
Indeed, this holds since VC classes satisfy that VC(F) = fat((F) for any 
< t < 1/2. 

Now we obtain the following corollary, which extends Dudley's result from 
VC classes to the real valued case. 

Corollary 5.6 Let F C B{Loo{^)) and assume that the integral 



Mt/s{F) log - dt 

converges. Then F is a uniform Donsker class. 

In particular this shows that if fat^{F) is "slightly better" than then 
F is a uniform Donsker class. 
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